{
  "nbformat": 4,
  "nbformat_minor": 0,
  "metadata": {
    "colab": {
      "name": "rebalancing.ipynb",
      "provenance": [],
      "collapsed_sections": []
    },
    "kernelspec": {
      "name": "python3",
      "display_name": "Python 3"
    }
  },
  "cells": [
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "f8ee3tlP8dhK",
        "colab_type": "text"
      },
      "source": [
        "## Rebalancing Design Pattern\n",
        "\n",
        "The Rebalancing Design Pattern provides various approaches for handling datasets that are inherently imbalanced. By this we mean datasets where one label makes up the majority of the dataset, leaving far fewer examples of other labels."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "9OUkQ-Q6kNXl",
        "colab_type": "code",
        "colab": {}
      },
      "source": [
        "import itertools\n",
        "import math \n",
        "import matplotlib.pyplot as plt\n",
        "import numpy as np\n",
        "import pandas as pd\n",
        "import tensorflow as tf\n",
        "import xgboost as xgb\n",
        "\n",
        "from tensorflow import keras\n",
        "from tensorflow.keras import Sequential\n",
        "\n",
        "from sklearn.metrics import confusion_matrix\n",
        "from sklearn.preprocessing import MinMaxScaler\n",
        "from sklearn.utils import shuffle\n",
        "from google.cloud import bigquery"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "c6vFDbUDjGhw",
        "colab_type": "text"
      },
      "source": [
        "#### Downsampling\n",
        "\n",
        "To demonstrate downsampling, we'll be using this [synthetic fraud detection](https://www.kaggle.com/ntnu-testimon/paysim1) dataset from Kaggle. We've made a version of it available in a public Cloud Storage bucket."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "HTPPRDgX7bWi",
        "colab_type": "code",
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 68
        },
        "outputId": "238a7a02-9c11-4fb9-9395-0cd48be76f7f"
      },
      "source": [
        "# Download the data and preview\n",
        "!gsutil cp gs://ml-design-patterns/fraud_data_kaggle.csv ."
      ],
      "execution_count": 2,
      "outputs": [
        {
          "output_type": "stream",
          "text": [
            "Copying gs://ml-design-patterns/fraud_data_kaggle.csv...\n",
            "- [1 files][470.7 MiB/470.7 MiB]                                                \n",
            "Operation completed over 1 objects/470.7 MiB.                                    \n"
          ],
          "name": "stdout"
        }
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "bgX95k94kF_u",
        "colab_type": "code",
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 224
        },
        "outputId": "51dc58bc-bd24-4cfe-e960-49f5e4eb5712"
      },
      "source": [
        "fraud_data = pd.read_csv('fraud_data_kaggle.csv')\n",
        "fraud_data.head()"
      ],
      "execution_count": 3,
      "outputs": [
        {
          "output_type": "execute_result",
          "data": {
            "text/html": [
              "<div>\n",
              "<style scoped>\n",
              "    .dataframe tbody tr th:only-of-type {\n",
              "        vertical-align: middle;\n",
              "    }\n",
              "\n",
              "    .dataframe tbody tr th {\n",
              "        vertical-align: top;\n",
              "    }\n",
              "\n",
              "    .dataframe thead th {\n",
              "        text-align: right;\n",
              "    }\n",
              "</style>\n",
              "<table border=\"1\" class=\"dataframe\">\n",
              "  <thead>\n",
              "    <tr style=\"text-align: right;\">\n",
              "      <th></th>\n",
              "      <th>step</th>\n",
              "      <th>type</th>\n",
              "      <th>amount</th>\n",
              "      <th>nameOrig</th>\n",
              "      <th>oldbalanceOrg</th>\n",
              "      <th>newbalanceOrig</th>\n",
              "      <th>nameDest</th>\n",
              "      <th>oldbalanceDest</th>\n",
              "      <th>newbalanceDest</th>\n",
              "      <th>isFraud</th>\n",
              "      <th>isFlaggedFraud</th>\n",
              "    </tr>\n",
              "  </thead>\n",
              "  <tbody>\n",
              "    <tr>\n",
              "      <th>0</th>\n",
              "      <td>1</td>\n",
              "      <td>PAYMENT</td>\n",
              "      <td>9839.64</td>\n",
              "      <td>C1231006815</td>\n",
              "      <td>170136.0</td>\n",
              "      <td>160296.36</td>\n",
              "      <td>M1979787155</td>\n",
              "      <td>0.0</td>\n",
              "      <td>0.0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>1</th>\n",
              "      <td>1</td>\n",
              "      <td>PAYMENT</td>\n",
              "      <td>1864.28</td>\n",
              "      <td>C1666544295</td>\n",
              "      <td>21249.0</td>\n",
              "      <td>19384.72</td>\n",
              "      <td>M2044282225</td>\n",
              "      <td>0.0</td>\n",
              "      <td>0.0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>2</th>\n",
              "      <td>1</td>\n",
              "      <td>TRANSFER</td>\n",
              "      <td>181.00</td>\n",
              "      <td>C1305486145</td>\n",
              "      <td>181.0</td>\n",
              "      <td>0.00</td>\n",
              "      <td>C553264065</td>\n",
              "      <td>0.0</td>\n",
              "      <td>0.0</td>\n",
              "      <td>1</td>\n",
              "      <td>0</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>3</th>\n",
              "      <td>1</td>\n",
              "      <td>CASH_OUT</td>\n",
              "      <td>181.00</td>\n",
              "      <td>C840083671</td>\n",
              "      <td>181.0</td>\n",
              "      <td>0.00</td>\n",
              "      <td>C38997010</td>\n",
              "      <td>21182.0</td>\n",
              "      <td>0.0</td>\n",
              "      <td>1</td>\n",
              "      <td>0</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>4</th>\n",
              "      <td>1</td>\n",
              "      <td>PAYMENT</td>\n",
              "      <td>11668.14</td>\n",
              "      <td>C2048537720</td>\n",
              "      <td>41554.0</td>\n",
              "      <td>29885.86</td>\n",
              "      <td>M1230701703</td>\n",
              "      <td>0.0</td>\n",
              "      <td>0.0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "    </tr>\n",
              "  </tbody>\n",
              "</table>\n",
              "</div>"
            ],
            "text/plain": [
              "   step      type    amount  ... newbalanceDest  isFraud  isFlaggedFraud\n",
              "0     1   PAYMENT   9839.64  ...            0.0        0               0\n",
              "1     1   PAYMENT   1864.28  ...            0.0        0               0\n",
              "2     1  TRANSFER    181.00  ...            0.0        1               0\n",
              "3     1  CASH_OUT    181.00  ...            0.0        1               0\n",
              "4     1   PAYMENT  11668.14  ...            0.0        0               0\n",
              "\n",
              "[5 rows x 11 columns]"
            ]
          },
          "metadata": {
            "tags": []
          },
          "execution_count": 3
        }
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "3DTxNMqpoiUK",
        "colab_type": "code",
        "colab": {}
      },
      "source": [
        "# Drop a few columns we won't use for this demo\n",
        "fraud_data = fraud_data.drop(columns=['nameOrig', 'nameDest', 'isFlaggedFraud'])\n",
        "fraud_data = pd.get_dummies(fraud_data)"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "QchBWbk2sghr",
        "colab_type": "code",
        "colab": {}
      },
      "source": [
        "# Split into separate dataframes\n",
        "fraud = fraud_data[fraud_data['isFraud'] == 1]\n",
        "not_fraud = fraud_data[fraud_data['isFraud'] == 0]\n",
        "\n",
        "# Take a random sample of non-fraud data\n",
        "# The .005 frac will give us around an 80/20 split of not-fraud/fraud samples\n",
        "# You can experiment with this value\n",
        "not_fraud_sample = not_fraud.sample(random_state=2, frac=.005)"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "nwKoDKAtsoEk",
        "colab_type": "code",
        "colab": {}
      },
      "source": [
        "# Put the data back together and shuffle\n",
        "fraud_data = pd.concat([not_fraud_sample,fraud])\n",
        "fraud_data = shuffle(fraud_data, random_state=2)"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "R_O9WfwHs72-",
        "colab_type": "code",
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 68
        },
        "outputId": "128d71f4-429f-494a-d5c8-e9b8a8d755cc"
      },
      "source": [
        "# Look at our data balance after downsampling\n",
        "fraud_data['isFraud'].value_counts()"
      ],
      "execution_count": 7,
      "outputs": [
        {
          "output_type": "execute_result",
          "data": {
            "text/plain": [
              "0    31772\n",
              "1     8213\n",
              "Name: isFraud, dtype: int64"
            ]
          },
          "metadata": {
            "tags": []
          },
          "execution_count": 7
        }
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "V44-7J4TkabP",
        "colab_type": "code",
        "colab": {}
      },
      "source": [
        "train_test_split = int(len(fraud_data) * .8)\n",
        "\n",
        "train_data = fraud_data[:train_test_split]\n",
        "test_data = fraud_data[train_test_split:]\n",
        "\n",
        "train_labels = train_data.pop('isFraud')\n",
        "test_labels = test_data.pop('isFraud')"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "SjD6MEDzkSUU",
        "colab_type": "code",
        "colab": {}
      },
      "source": [
        "model = xgb.XGBRegressor(\n",
        "    objective='reg:linear'\n",
        ")"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "Md8FAfE5pEdX",
        "colab_type": "code",
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 173
        },
        "outputId": "770e7cb4-c542-48b5-a14b-967f9297b70e"
      },
      "source": [
        "model.fit(train_data.values, train_labels)"
      ],
      "execution_count": 10,
      "outputs": [
        {
          "output_type": "stream",
          "text": [
            "[17:42:39] WARNING: /workspace/src/objective/regression_obj.cu:152: reg:linear is now deprecated in favor of reg:squarederror.\n"
          ],
          "name": "stdout"
        },
        {
          "output_type": "execute_result",
          "data": {
            "text/plain": [
              "XGBRegressor(base_score=0.5, booster='gbtree', colsample_bylevel=1,\n",
              "             colsample_bynode=1, colsample_bytree=1, gamma=0,\n",
              "             importance_type='gain', learning_rate=0.1, max_delta_step=0,\n",
              "             max_depth=3, min_child_weight=1, missing=None, n_estimators=100,\n",
              "             n_jobs=1, nthread=None, objective='reg:linear', random_state=0,\n",
              "             reg_alpha=0, reg_lambda=1, scale_pos_weight=1, seed=None,\n",
              "             silent=None, subsample=1, verbosity=1)"
            ]
          },
          "metadata": {
            "tags": []
          },
          "execution_count": 10
        }
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "CSaBKqwspHK1",
        "colab_type": "code",
        "colab": {}
      },
      "source": [
        "# Get some test predictions\n",
        "y_pred = model.predict(test_data.values)"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "zWMjMmZzto8r",
        "colab_type": "code",
        "colab": {}
      },
      "source": [
        "# To build a confusion matrix using the scikit utility, we'll need the values as ints\n",
        "y_pred_formatted = []\n",
        "\n",
        "for i in y_pred:\n",
        "  y_pred_formatted.append(int(round(i)))"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "uNHbcxLoqdRb",
        "colab_type": "code",
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 51
        },
        "outputId": "de700f96-5164-4e13-8a6d-d0afc665bead"
      },
      "source": [
        "cm = confusion_matrix(test_labels.values, y_pred_formatted)\n",
        "print(cm)"
      ],
      "execution_count": 13,
      "outputs": [
        {
          "output_type": "stream",
          "text": [
            "[[6360   43]\n",
            " [  82 1512]]\n"
          ],
          "name": "stdout"
        }
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "688jagiQqhLW",
        "colab_type": "code",
        "colab": {}
      },
      "source": [
        "# This is from the sklearn docs\n",
        "# https://scikit-learn.org/0.18/auto_examples/model_selection/plot_confusion_matrix.html\n",
        "def plot_confusion_matrix(cm, classes,\n",
        "                          normalize=False,\n",
        "                          title='Confusion matrix',\n",
        "                          cmap=plt.cm.Blues):\n",
        "    \"\"\"\n",
        "    This function prints and plots the confusion matrix.\n",
        "    Normalization can be applied by setting `normalize=True`.\n",
        "    \"\"\"\n",
        "    plt.imshow(cm, interpolation='nearest', cmap=cmap)\n",
        "    plt.title(title)\n",
        "    plt.colorbar()\n",
        "    tick_marks = np.arange(len(classes))\n",
        "    plt.xticks(tick_marks, classes, rotation=45)\n",
        "    plt.yticks(tick_marks, classes)\n",
        "\n",
        "    if normalize:\n",
        "        cm = np.round(cm.astype('float') / cm.sum(axis=1)[:, np.newaxis], 3)\n",
        "\n",
        "    thresh = cm.max() / 2.\n",
        "    for i, j in itertools.product(range(cm.shape[0]), range(cm.shape[1])):\n",
        "        plt.text(j, i, cm[i, j],\n",
        "                 horizontalalignment=\"center\",\n",
        "                 color=\"white\" if cm[i, j] > thresh else \"black\")\n",
        "\n",
        "    plt.tight_layout()\n",
        "    plt.ylabel('True label')\n",
        "    plt.xlabel('Predicted label')"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "03cAxFElqhxL",
        "colab_type": "code",
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 311
        },
        "outputId": "520be4db-5382-4272-8bf9-689af83d5072"
      },
      "source": [
        "# With downsampling, our model's accuracy on fraud is almost as good as non-fraud examples\n",
        "# You can compare this by training a model on the full dataset if you'd like (it'll take a long time to train given the size)\n",
        "classes = ['not fraud', 'fraud']\n",
        "plot_confusion_matrix(cm, classes, normalize=True)"
      ],
      "execution_count": 15,
      "outputs": [
        {
          "output_type": "display_data",
          "data": {
            "image/png": "iVBORw0KGgoAAAANSUhEUgAAAVsAAAEmCAYAAADMczPyAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4yLjEsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+j8jraAAAgAElEQVR4nO3dd5wW1d338c93WZpIR5GmAmJBY0GsiSUaKygmsUVjUHkeb43RmMQkmobRFFPuW2OMMXYsUUTjbY1IjMbyxIJYwQIRFBYbRUT6Lr/njzkLF8uWC9m9rr12v29f89qZM2dmzrW4vz37mzNnFBGYmVnTKit2A8zMWgMHWzOzAnCwNTMrAAdbM7MCcLA1MysAB1szswJwsLVGJ6mjpPslLZI0YSPOc7KkRxqzbcUiaT9Jbxa7HVY88jjb1kvSScB3ge2BxcBLwC8j4qmNPO8pwDnAvhFRudENbeYkBTAkImYUuy3WfLln20pJ+i5wOfAroDewJXAVMKoRTr8V8FZrCLT5kFRe7DZYMxARXlrZAnQFPgWOq6dOe7JgPDctlwPt074DgTnA94APgfeA09K+nwMrgVXpGmOAi4Bbc869NRBAedo+FXibrHc9Ezg5p/ypnOP2BZ4HFqWv++bsexy4BHg6necRoFcdn626/T/Iaf8xwJHAW8AC4Ec59fcE/g18nOpeCbRL+55In2VJ+rwn5Jz/h8D7wC3VZemYwekaw9J2X+Aj4MBi/7/hpekW92xbp32ADsA99dT5MbA3sCuwC1nA+UnO/i3IgnY/soD6J0ndI2IsWW95fERsGhHX19cQSZ2AK4AjIqIzWUB9qZZ6PYAHU92ewP8AD0rqmVPtJOA0YHOgHXB+PZfegux70A/4GXAt8HVgd2A/4KeSBqa6VcB3gF5k37uDgW8CRMT+qc4u6fOOzzl/D7Je/hm5F46I/5AF4lslbQLcCIyLiMfraa+VOAfb1qknMC/q/zP/ZODiiPgwIj4i67GekrN/Vdq/KiIeIuvVbfcZ27Ma2ElSx4h4LyKm1lJnBDA9Im6JiMqIuB14Azgqp86NEfFWRCwD7iT7RVGXVWT56VXAHWSB9A8RsThdfxrZLxki4oWIeCZddxbwF+CAPD7T2IhYkdqzjoi4FpgBPAv0IfvlZi2Yg23rNB/o1UAusS/wTs72O6lszTlqBOulwKYb2pCIWEL2p/eZwHuSHpS0fR7tqW5Tv5zt9zegPfMjoiqtVwfDD3L2L6s+XtK2kh6Q9L6kT8h67r3qOTfARxGxvIE61wI7AX+MiBUN1LUS52DbOv0bWEGWp6zLXLI/gattmco+iyXAJjnbW+TujIiJEXEIWQ/vDbIg1FB7qttU8RnbtCH+TNauIRHRBfgRoAaOqXeYj6RNyfLg1wMXpTSJtWAOtq1QRCwiy1P+SdIxkjaR1FbSEZJ+m6rdDvxE0maSeqX6t37GS74E7C9pS0ldgQurd0jqLWlUyt2uIEtHrK7lHA8B20o6SVK5pBOAocADn7FNG6Iz8Anwaep1n1Vj/wfAoA085x+AyRHxf8hy0VdvdCutWXOwbaUi4r/Jxtj+hOxO+GzgW8D/piq/ACYDrwCvAlNS2We51iRgfDrXC6wbIMtSO+aS3aE/gPWDGRExHxhJNgJiPtlIgpERMe+ztGkDnU92820xWa97fI39FwHjJH0s6fiGTiZpFHA4az/nd4Fhkk5utBZbs+OHGszMCsA9WzOzAnCwNTMrAAdbM7MCcLA1MysAT5CRJ5V3DLXrXOxmWA277bBlsZtgtZgy5YV5EbFZY52vTZetIirXexCvVrHso4kRcXhjXbuxONjmSe060367Bkf1WIE9/eyVxW6C1aJjW9V82m+jROWyvH/+lr/0p4ae7isKB1sza/4kKGtT7FZsFAdbMysNKu1bTA62ZlYa1NB0FM2bg62ZlQC5Z2tm1uSEc7ZmZk1PTiOYmRVEiacRSrv1ZtZ6SPktDZ5G3STdJekNSa9L2kdSD0mTJE1PX7unupJ0haQZkl6RNCznPKNT/emSRjd0XQdbM2v+qsfZ5rM07A/AwxGxPdl75l4HLgAejYghwKNpG+AIYEhaziB7a0f1C0jHAnuRvQx1bHWArouDrZmVBpXlt9R3iuxNIfuTvY6IiFgZER8Do4Bxqdo41r4yahRwc2SeAbpJ6gMcBkyKiAURsRCYRDYhfJ0cbM2sBGhDgm0vSZNzltxXyQ8kezPJjZJelHRdeiVT74h4L9V5H+id1vuRvcWk2pxUVld5nXyDzMxKQ1neoxHmRcTwOvaVA8OAcyLiWUl/YG3KAICICEmN/gob92zNrPmrHme78TnbOcCciHg2bd9FFnw/SOkB0tcP0/4KYEDO8f1TWV3ldXKwNbMSsEFphDpFxPvAbEnbpaKDgWnAfUD1iILRwL1p/T7gG2lUwt7AopRumAgcKql7ujF2aCqrk9MIZlYaGu+hhnOA2yS1A94GTiPreN4paQzwDlA9n+NDwJHADGBpqktELJB0CfB8qndxRCyo76IOtmZWGhrpoYaIeAmoLad7cC11Azi7jvPcANyQ73UdbM2s+fN8tmZmBeK5EczMmpqnWDQzKwz3bM3MmpgEZaUdrkq79WbWerhna2ZWAM7ZmpkVgHu2ZmZNzONszcwKQ+7Zmpk1LeFga2bW9JSWEuZga2YlQJSVeTSCmVmTcxrBzKwAHGzNzJqac7ZmZk1PztmamRWG0whmZgXgYGtm1tScszUza3rO2ZqZFYjTCGZmhVDasZbS7pebWesgKCsry2tp8FTSLEmvSnpJ0uRU1kPSJEnT09fuqVySrpA0Q9IrkoblnGd0qj9d0uiGrutga2YlQVJeS56+GBG7RsTwtH0B8GhEDAEeTdsARwBD0nIG8OfUlh7AWGAvYE9gbHWArouDrZk1eyK/QLsRed1RwLi0Pg44Jqf85sg8A3ST1Ac4DJgUEQsiYiEwCTi8vgs42LYAh+y7Ay/f81Neu3cs5592yHr7t+zTnYeuPofnxl/IxGu/Tb/Nu63Z94tzRzF5wo+YPOFHHHvomr+Q+PPYk3h2/AU8N/5C/vq7MXTq2K4gn6WleGTiw+y843bsuP02/O63l663f8WKFXz9pBPYcftt2G/fvXhn1qw1+373m1+z4/bbsPOO2zHpkYkAvPXmm+y1+65rls17dOGPf7i8UB+neVCeC/SSNDlnOaPGmQJ4RNILOft6R8R7af19oHda7wfMzjl2Tiqrq7xOvkFW4srKxOUXHM+Is66k4oOPeeq27/PAv17ljbffX1Pn19/5Mrc9+By33f8sB+yxLRefczRjfnozh39hR3bdYQB7nXgp7duW88h132bi09NYvGQ5P/j931i8ZDkAv/neVzjrxAP4/Y2TivUxS0pVVRXnnXs2D/59Ev369+cLe+/ByJFHs8PQoWvq3HTD9XTv1p2pb8zgzvF38OMf/ZBb/zqe16dNY8L4O5jy8lTemzuXIw//Eq9Oe4ttt9uOZ194ac35B2/Vj6OP+XKxPmLhpZxtnublpAdq84WIqJC0OTBJ0hu5OyMiJMVnbWpd3LMtcXvstDX/mT2PWRXzWVVZxYSJUxh54M7r1Nl+UB/+9dybAPzr+bcYeeDnANhh0BY8NWUGVVWrWbp8Ja9Or+DQfXcAWBNoATq0b0tEo/+/12I9/9xzDB68DQMHDaJdu3Ycd8KJPHD/vevUeeD+ezn5lOyeyle+eiyP//NRIoIH7r+X4044kfbt27P1wIEMHrwNzz/33DrHPvbPRxk4aDBbbbVVwT5Tc9BYaYSIqEhfPwTuIcu5fpDSA6SvH6bqFcCAnMP7p7K6yuvkYFvi+m7elTkfLFyzXfHBQvpt1nWdOq++VcGog3YFYNRBu9Bl04706NqJV97KgmvHDm3p2a0TBwzflv5brM3x/+WirzPrH79iu617c9Ud/yrMB2oB5s6toH//tT+H/fr1p6KiYv06A7I65eXldOnalfnz51NRsf6xc+eue+yE8Xdw/Alfa8JP0Ezln0ao+xRSJ0mdq9eBQ4HXgPuA6hEFo4Hq3473Ad9IoxL2BhaldMNE4FBJ3dONsUNTWZ2aTbCVdKqkvnXs2z4N03hR0uAmuPYsSb0a+7zNxYWX3cN+u2/Dv2//Ifvtvg0VHyykqmo1jz7zBg8/NY3Hbvoe4359Gs++MpOqqtVrjvuvi25l0KE/5o2Z73PsobsX8RNYtZUrV/LgA/fxlWOPK3ZTCq6Rera9gackvQw8BzwYEQ8DlwKHSJoOfCltAzwEvA3MAK4FvgkQEQuAS4Dn03JxKqtTc8rZnkr2G2ZuLfuOAe6KiF/kFir7zioiVtdyTKsw98NF9O+9tjfar3d3Kj5atE6d9z5axInnXwdAp47tOObgXVn06TIAfnv9RH57ffYL+aZfncr0dz9c59jVq4MJE1/gu6MP4Zb7nmnKj9Ji9O3bjzlz1t47qaiYQ79+/davM3s2/fv3p7Kykk8WLaJnz57067f+sX37rj124sN/Z9fdhtG7d29aE6lxHteNiLeBXWopnw8cXEt5AGfXca4bgBvyvXaT9GwlbS3pdUnXSpoq6RFJHdO+XSU9kwYI35O64ccCw4HbUg+2Y865jgTOA86S9Fg695uSbiYLzgMk/TnddZwq6ec5x67psUoaLunxtN4ztWmqpOso4WdTJk99h2223Iyt+vakbXkbjjtsGA8+/so6dXp267TmN/73Tz+McfdmQbOsTPTo2gmAnYb0ZachffnHv7N7BYMGrO3ojzxgZ96a9UEhPk6LMHyPPZgxYzqzZs5k5cqVTBh/ByNGHr1OnREjj+a2W7KRRn+7+y4O+OJBSGLEyKOZMP4OVqxYwayZM5kxYzp77LnnmuPuHH9760wh0OjjbAuuKXu2Q4CvRcT/lXQn8FXgVuBm4JyI+Jeki4GxEXGepG8B50fE5NyTRMRDkq4GPo2I30vaOp17dBr3hqQfR8QCSW2ARyXtHBHrRpx1jQWeioiLJY0AxtRWKQ0LyYaGtN30s34fmlRV1Wq+85s7uf+qs2lTJsbd+wyvv/0+Pz1rBFOmvcuD/3qV/YcP4eJzjiYCnpoyg/N+fScAbcvb8I8bzgNg8afLOf3H46iqWo0krrv4FDp36oiU5XzP/dX4Yn7MklJeXs5lf7iSo0YcRlVVFaNPPZ2hO+7IxRf9jGG7D2fkUUdz6uljOP3UU9hx+23o3r0Ht9x2BwBDd9yRrx53PLvtPJTy8nIuv+JPtGnTBoAlS5bwz39M4sqr/lLMj1c8zTeO5kVNcZc5BcRJ6WkMJP0QaAv8EXg1IrZM5YOBCRExLPU61wu2qd5FrBtsH4uIgTn7zyQLiuVAH7JgfoekWcDwiJgnaTjw+4g4UNJLwFfSnxRIWgBsGxHz6vpMZZtsHu23O34jvivWFBY+f2Wxm2C16NhWLzQw/GqDtO89JPqd/Ie86s68bESjXruxNGXPdkXOehXQsa6Kn8GS6hVJA4HzgT0iYqGkm4AOaXcla1MlHTCzkiRlaa9SVtDRCBGxCFgoab9UdApQPaZoMdD5M5y2C1nwXSSpN9mzzNVmAdW30b+aU/4EcBKApCOAep9pNrNia/LHdZtcMUYjjAaulrQJ2ZCK01L5Tal8GbBPRCzL52QR8bKkF4E3yB6fezpn98+B6yVdAjxeo/x2SVOB/we8+9k/jpkVQjOOo3lpkmAbEbOAnXK2f5+z/hKwdy3H3A3cXcf5Lqrr3Kns1DqOexLYtpby+WSDkM2sRDTnXms+mtM4WzOzWknQpo2DrZlZkyvxjq2DrZmVBqcRzMyamtyzNTNrcn6VuZlZgbhna2ZWAM7Zmpk1Nedszcyanij9uREcbM2sJDiNYGZWACUeax1szawEyD1bM7Mml42zdbA1M2tyJd6xdbA1s9LgNIKZWVNrAeNsS/thYzNrFbJxtmV5LXmdT2oj6UVJD6TtgZKelTRD0nhJ7VJ5+7Q9I+3fOuccF6byNyUd1tA1HWzNrCRI+S15+jbwes72b4DLImIbYCEwJpWPARam8stSPSQNBU4EdgQOB66S1Ka+CzrYmllJaKwXPkrqD4wArkvbAg4C7kpVxgHHpPVRaZu0/+BUfxRwR0SsiIiZwAxgz/qu62BrZs1fnr3aPHu2lwM/AFan7Z7AxxFRmbbnAP3Sej+yF8mS9i9K9deU13JMrRxszazZqx5nm88C9JI0OWc5Y815pJHAhxHxQqE/g0cjmFlJKMs/ITsvIobXse/zwNGSjgQ6AF2APwDdJJWn3mt/oCLVrwAGAHMklQNdgfk55dVyj6m9/fm23sysmBojjRARF0ZE/4jYmuwG1z8j4mTgMeDYVG00cG9avy9tk/b/MyIilZ+YRisMBIYAz9V37Tp7tpL+CEQ9jT63/o9lZtY41PRzI/wQuEPSL4AXgetT+fXALZJmAAvIAjQRMVXSncA0oBI4OyKq6rtAfWmEyRvZeDOzRtOmkedGiIjHgcfT+tvUMpogIpYDx9Vx/C+BX+Z7vTqDbUSMy92WtElELM33xGZmjanFP0EmaR9J04A30vYukq5q8paZmSUiG5GQz3/NVT43yC4HDiO7A0dEvAzs35SNMjNbh0SbsvyW5iqvoV8RMbtGcrreRLCZWWMr9TRCPsF2tqR9gZDUlvWfKTYza1Jig8bZNkv5pBHOBM4mexRtLrBr2jYzK5hGnoim4Brs2UbEPODkArTFzKxWUum/yjyf0QiDJN0v6SNJH0q6V9KgQjTOzKxamZTX0lzlk0b4K3An0AfoC0wAbm/KRpmZ1aQ8l+Yqn2C7SUTcEhGVabmVbAIHM7OCaaz5bIulvrkReqTVv0u6ALiDbK6EE4CHCtA2MzMgC7TNeQxtPuq7QfYCWXCt/oT/lbMvgAubqlFmZjU1405rXuqbG2FgIRtiZlaf5pwiyEdeT5BJ2gkYSk6uNiJubqpGmZnlyh5qKHYrNk6DwVbSWOBAsmD7EHAE8BTgYGtmBdOch3XlI5/RCMcCBwPvR8RpwC5kr4YwMysIqfTH2eaTRlgWEaslVUrqAnzIuu/eMTNrcs04juYln2A7WVI34FqyEQqfAv9u0laZmdXQ4m+QRcQ30+rVkh4GukTEK03bLDOztUQLHmcraVh9+yJiStM0ycyshmY+o1c+6uvZ/nc9+wI4qJHb0qztusOWPP3MH4vdDKvhlXcXFbsJViAtNo0QEV8sZEPMzOqTz9Cp5iyvhxrMzIpJNP6rzAvNwdbMSkKJx9qS75mbWSuQvfJm46dYlNRB0nOSXpY0VdLPU/lASc9KmiFpvKR2qbx92p6R9m+dc64LU/mbkg5r6DPk86YGSfq6pJ+l7S0l7dnQcWZmjalM+S0NWAEcFBG7kL1P8XBJewO/AS6LiG2AhcCYVH8MsDCVX5bqIWkocCKwI3A4cJWkNvW2P4/PeBWwD/C1tL0Y+FMex5mZNYrqnG0+S30i82nabJuW6tFVd6XyccAxaX1U2ibtP1hZ93kUcEdErIiImcAMoN5OaD7Bdq+IOBtYnhq7EGiXx3FmZo2mLM8F6CVpcs5yRu55JLWR9BLZ1AOTgP8AH0dEZaoyh+xt4qSvswHS/kVAz9zyWo6pVT43yFal7nGkhm4GrM7jODOzRrMBw2znRcTwunZGRBWwa5qG4B5g+41vXcPy6dleQdagzSX9kmx6xV81aavMzHIozxm/NmTWr4j4GHiMLE3aTVJ157M/UJHWK0gTb6X9XYH5ueW1HFOrBoNtRNwG/AD4NfAecExETMjz85iZNYo2Zfkt9ZG0WerRIqkjcAjwOlnQPTZVGw3cm9bvS9uk/f+MiEjlJ6bRCgOBIcBz9V07n8nDtwSWAvfnlkXEuw0da2bWGLI3NTTKQNs+wLiUGi0D7oyIByRNA+6Q9AvgReD6VP964BZJM4AFZCMQiIipku4EpgGVwNkpPVGnfHK2D7L2xY8dgIHAm2RDHszMCqIxYm2asXC3WsrfppbRBBGxHDiujnP9EvhlvtfOZ4rFz+Vup9nAvllHdTOzxpffGNpmbYMf142IKZL2aorGmJnVRkCbljrrVzVJ383ZLAOGAXObrEVmZrVoDT3bzjnrlWQ53LubpjlmZrVrsfPZQvakBdA5Is4vUHvMzNaTjUYodis2Tn2vxSmPiEpJny9kg8zM1qOWPZ/tc2T52Zck3QdMAJZU74yIvzVx28zMgBbes83RgezxtINYO942AAdbMyuYEk/Z1htsN08jEV5jbZCtFk3aKjOzHEIteuhXG2BT1g2y1RxszaxwWvhDDe9FxMUFa4mZWT0aaW6Eoqkv2Jb2JzOzFkO07JztwQVrhZlZA1rs0K+IWFDIhpiZ1UWU/qvAN3giGjOzglMLf1zXzKy5KO1Q62BrZiWgVUyxaGbWHJR4rHWwNbNSIOdszcyamkcjmJkVSEt+gszMrHnw0C8zs6bXEtIIpd5+M2slJOW1NHCOAZIekzRN0lRJ307lPSRNkjQ9fe2eyiXpCkkzJL0iaVjOuUan+tMljW6o/Q62ZlYSypTf0oBK4HsRMRTYGzhb0lDgAuDRiBgCPJq2AY4AhqTlDODPkAVnYCywF7AnMLY6QNfZ/s/wmc3MCipLIyivpT4R8V5ETEnri4HXgX7AKGBcqjYOOCatjwJujswzQDdJfYDDgEkRsSAiFgKTgMPru7ZztmZWEjbg/lgvSZNztq+JiGvWP5+2BnYDngV6R8R7adf7QO+03g+YnXPYnFRWV3mdHGzNrAQI5T87wryIGF7v2aRNgbuB8yLik9xcb0SEpEZ/G43TCGbW7FXPjZDP0uC5pLZkgfa2nLeEf5DSA6SvH6byCmBAzuH9U1ld5XVysDWz5k9ZGiGfpd7TZF3Y64HXI+J/cnbdB1SPKBgN3JtT/o00KmFvYFFKN0wEDpXUPd0YOzSV1clpBDMrCY30TMPngVOAVyW9lMp+BFwK3ClpDPAOcHza9xBwJDADWAqcBtnLFSRdAjyf6l3c0AsX3LNtAR6Z+DC77Lg9O+0whN//9tL19q9YsYJTTjqRnXYYwv6f35t3Zs0C4J1Zs+jRZRP2Gr4bew3fjXPOPnPNMWN/+mOGDNqSzbp3LtTHaFG6diznc/03ZecBm9Kna/v19rcrF9v16cRO/TZl+z6daNtm3UhSJth1y85s1bPDmrIendqyU79N2an/pvTv0aHmKVs85flffSLiqYhQROwcEbum5aGImB8RB0fEkIj4UnXgTKMQzo6IwRHxuYiYnHOuGyJim7Tc2FD7HWxLXFVVFd/59rf43/sfYsrLU5kw/g5enzZtnTo33Xg93bp347XXp3POuefxkx9dsGbfoEGDeXbyizw7+UX++Ker15SPGHkUTzz9bME+R0uzVa8OvPX+El6d/Sk9N21Lh7br/qht2aMj8xev5LWKT6lYuJwBNYJn/x4dWLy8cs12eZkY0LMDb7y3hNfmfEq7NqJLhzYF+SzNQWPmbIvFwbbETX7+OQYP3oaBgwbRrl07jj3+BB64/9516jx4/318/ZQsHfXlrx7L4489SkT9N1v33Gtv+vTp02Ttbsk2bd+GFatWs6IyCGD+klV079R2nTod2pXxybIsmC5eXrXO/k3aldG2jVi0dG2wbd+2jOWrVlO5Ovt3W7Sscr1ztnSNkbMtJgfbEje3ooJ+/fuv2e7Xrz9z51bUUie7cVpeXk6Xrl2ZP38+ALNmzWTvPYZx6MEH8vRTTxau4S1Y23KxonLtL7OVlatpVyNNsGzl2gDbfZNy2pSJ8vT405Y9O/Lu/OXr1F++qoqObctoV57V6d6pLe3KW9ePb2OkEYqpZG6QSToXOAuYEhEnN+J5DwTOj4iRjXXOUrFFnz68+Z936NmzJ1OmvMAJx36ZF156jS5duhS7aS3eu/OXs1WvjvTq3I7FyytZWbmaINi8Szs+XrqKVVXr/uVRtRpmzVvGNptvAmS94ZqpiZZM5PUobrNWMsEW+CbwpYiYU10gqTwiKus5psXr268fFXPWfEuoqJhD3779aqkzm/79+1NZWcknixbRs2dPJNG+fXbzZtiw3Rk0aDDTp7/F7rvXOx7cGrCqMmhfvjYytCsvY2WN4LmqKpjxwVIgCyI9OrWlanWWgujcsZzeXdpTVpbN4VoVwZwFK/h4aSUfp9TCZp1bVwoBqeTnsy2JX42SrgYGAX+XtEjSLZKeBm6RtLWkJyVNScu+6ZgDJT2Qc44rJZ2a1g+X9IakKcBXivCRGs3uw/dgxozpzJo5k5UrV3LXneMZMfLodeocOfIobr0le+z7nrvv4oADD0ISH330EVVVVQDMfPttZsyYzsCBgwr+GVqaT1dU0b5tG9qVZ3/U9uzUlo+XrFqnTnlON61vt/Z8tHglAG9/tIyX313My7MXM3v+cuYtXsmcBSvWOaZNGWzeZe0xrYXyXJqrkujZRsSZkg4Hvgh8CzgK+EJELJO0CXBIRCyXNAS4HaizayapA3AtcBDZ2Lnx9dQ9g2ymHwZsuWVjfZxGVV5ezv9c/keOHnE4Vaur+Mbo0xi6445cfNHPGLb7cEYedTSnnjaGMad+g512GEL37j24+dbbAXj6ySe45OdjKW/blrKyMq648s/06NEDgB9f8APGj7+dpUuXss3AAZx62hh+8rOLivhJS8s785ax/RadQPDR4lUsW7Waft3bs2RFFR8vraRzxzbZCISAT5ZX8c68ZQ2ec6teHdikXTYCoWLhCpavWt3UH6PZyNIIzTmUNkwN3ZVuLiTNIgui3yIb/vbzVN4VuBLYFagCto2ITWrmYiVdCUwGXgKuiIj9U/nRwBkN5WyH7T48nn7m+fqqWBG8OvuTYjfBarHX4G4vNDQ/wYbY4XO7xY33PJZX3X2GdG/UazeWkujZ1mJJzvp3gA+AXcjSItW3cStZN03S+kaBm7Ugpf5anJLI2TagK/BeRKwmewyveqT3O8BQSe0ldQMOTuVvAFtLGpy2v1bQ1prZZ+JxtsV3FTBa0svA9qReb0TMBu4EXktfX0zly8nysA+mG2Qf1nZSM2tefIOsQCJi67R6UY3y6cDOOUU/zNn3A+AHtZzrYbLAbGalojlH0jyUTLA1s9ZLKv3RCA62ZlYSSmlrmv0AAAyGSURBVDvUOtiaWako8WjrYGtmJaB5TzKTDwdbM2v2PBGNmVmhONiamTU9pxHMzArAaQQzs6bW3B8Py4ODrZmVBKcRzMyamGjek8zkoyVMRGNmrUBjzfol6QZJH0p6Laesh6RJkqanr91TuSRdIWmGpFckDcs5ZnSqP13S6Iau62BrZiWhEd+uexNweI2yC4BHI2II8GjaBjgCGJKWM4A/QxacgbHAXsCewNjqAF0XB1szKwmN1bONiCeABTWKRwHj0vo44Jic8psj8wzQTVIf4DBgUkQsiIiFwCTWD+DrcM7WzErCBqRse0manLN9TURc08AxvSPivbT+PtA7rfcDZufUm5PK6iqvk4OtmTV72Q2yvMPtvI15B1lEhKRGfzmj0whm1vzlmULYiBELH6T0AOlr9RtcKoABOfX6p7K6yuvkYGtmJaGJX4tzH1A9omA0cG9O+TfSqIS9gUUp3TAROFRS93Rj7NBUVienEcysNDTSOFtJtwMHkuV255CNKrgUuFPSGLKXxR6fqj8EHAnMAJYCpwFExAJJlwDPp3oXR0TNm27rcLA1sxKgRnstTkTU9Ubtg2sWREQAZ9dxnhuAG/K9roOtmTV7LWBqBAdbMysRJR5tHWzNrCR4IhozswLwfLZmZk1t48bQNgsOtmZWIko72jrYmlmz1xLms3WwNbOS4JytmVkBeDSCmVkhlHasdbA1s9JQ4rHWwdbMmj+JRpsboVgcbM2sNJR2rHWwNbPSUOKx1sHWzEpDiWcRHGzNrPlTI85nWyx+LY6ZWQG4Z2tmJaHEO7YOtmZWGvwEmZlZE8vG2Ra7FRvHwdbMSoODrZlZ03MawcysAHyDzMysABxszcwKoNTTCIqIYrehJEj6CHin2O1oJL2AecVuhK2nJf27bBURmzXWySQ9TPb9yce8iDi8sa7dWBxsWyFJkyNieLHbYevyv0vL5sd1zcwKwMHWzKwAHGxbp2uK3QCrlf9dWjDnbM3MCsA9WzOzAnCwNTMrAAdbq5ekfMc2WhOR5J/TFsD/iFYrZbYAXpA0qtjtaW0kDZX0Z0nlEbFaKvWHVc3B1uqiiHgfOBf4laQvFbtBrUXqyQpoD/xeUpuICAfc0uZga7WKiNVpdRHwAXC/e7hNT5IiYnVETAUeArYj+2XngFviHGytTpJOBf4HOBv4GXCNpKPTPv/QN4FIYzElnQ+cCbwL7AJckVIK4RxuafI/mq1RSwDdApgQEa9HxO+A84DbJY0KD9BuMpK6AkcAJ0TEfwHfBzoCv6jO4Ra1gfaZONjaGjm9qr1TUQWwTc7+24F/Az+RtEnhW9jySdoUWA70Boal4jeBV4FRwCVFapptJAdbW4eknsAlkn4G3A4MkHSZpD0knU42zeRXI2JpURvaAkk6kCx1APBL4LuS9o2IlcBC4H+BK4vUPNtIflzX1iGpDbAzWY72X8BVwO/I/oz9HPB/I+K14rWw5Ug3wyJn+xjgy8CzwBPA7sCvgfuAEcCXIuLNYrTVNp6DrQEg6SRgekQ8nwLuUOBS4P6IuDrV6RIRnxSznS2RpL0i4tm0fiRwFPAacB1ZGqcbMDciZhavlbaxnEZopWq5GTYYGC9pt4ioAt4AJgE/kPTDVGdxIdvYGkjaHLhQ0m8AIuIh4GHgDLIbYwsi4mkH2tLnYNsK5f75KqkfQERcQjbM63ZJu0fEKrI84c1kuVs8AmHj1fwlFxEfAr8C+ku6JJXdC0wF+gLLCt5IaxJOI7Rikr4DHAB8AtwYEY9JOpOsVzUFOAQ4KCL+U8Rmthg1fsmdSvbC1ZURcbOk4cD3yILr48BpwKkR0VLee9fqOdi2UpJOA0aTBdSngSXANRFxe/rB35wshzu9iM1skSR9Gzge+DFwP/CLiPiNpL5koxA6Ar+MiFeL2ExrZH6VeStRo1fVBegCnEL2dNh8skdDvy+pPfC3iJhctMa2ICltoOoHEST1J/sFdyQwhmzc8vcldY2IHwGnSeoYEU4ftDAOtq1ETqA9G9iKrFfViyxNcETadzLZsK+/FaudLVCniPgUQNJossdvvw58nmy88uclHQE8KGlRRPzGgbZl8g2yFk7SwZL2SesnAXsDV6UbYJ8CW0j6tqSvkOVuL/PwrsaRJu65PK2PAE4FXouIj8l+9p5NVbsAvyV7aMFaKPdsWzBJuwJ3Aruk9MAw4Gjg2wARsTgN6/ou0B04KyJmF6u9LUl6Eu8c4AxJXwO+Cfw7Ij5KVVYAfSXdAnwB+GJEzCpKY60gfIOsBZO0A9njnwuAQWn9NmAT4KjUu0VSOdA5IhYWq60tjaTOwATgPbK/Jp4ku+n43xHxZKqzN9kDC29HxFvFaqsVhtMILdu7ZPOhfhN4OOUCTySbYOYuSe0AIqLSgbZxRcRi4J9kow5ujIgzyJ4KGyFpv1TnmYh42IG2dXCwbdlWAY8BfwW2lXRwmtTkLLKZpW4pZuNagfFkM3WdLmkM8Cey7/sJOTOrWSvhNEILUj28q5YJTjYnG+LVEXggIp6Q1BboFRHvFau9rYWkYWSB9xfAP8jGN1+Xnh6zVsLBtoVIOcIVEbFSUu+I+KDG/q3JxtVuAfw1Ip4ufCtbL0m7kKUVzgHGp/knrBVxsG0B0g2uk4EqYADZoPnDgMoaPdzBwLFkOUT3qgpM0ueAZRExo9htscJzsG0h0pNJ/wLaASMi4pU66pVHRGVBG2dmvkFWyqpnkEo52jnANcB/gP0k9apRtwyykQcFb6iZuWdbqmrMdXAQ8DHZE2Hvkz3I8HhEXCrpeOCNunq6ZlYYfoKsROUE2nOBb5BNibgdcAPZs/e3ShpCNuv//sVqp5ll3LMtYWlKvrvIJjR5T9JQsmB7Adnk0zuSPZ30bhGbaWY4Z1tSanmVDWSD5JcARMQ0sgcYdo+IjyLicQdas+bBwbZE1MjRDgGIiLnAW8DdOVU3BQYrKXxLzaw2ztmWgBqB9lvAuZKeAf5ONmPXf0t6kWwC8FHAsX5fmFnz4mBbAnIC7dFkk3sfARwE7Al0iYizJI0E2gA3+VU2Zs2Pb5CViPQW3H8D/4iI09OMXV8B9gFmAX+JiKVFbKKZ1cM52xIRERXAecDhkk5Ms3fdSTbka3OgfTHbZ2b1cxqhhETE3yStAH4tiYi4I8303ynNn2pmzZSDbYmJiAclrQaukVQZEXcBDrRmzZxztiVK0iHAfyLi7WK3xcwa5mBrZlYAvkFmZlYADrZmZgXgYGtmVgAOtmZmBeBga2ZWAA62tkEkVUl6SdJrkiZI2mQjznWTpGPT+nVpPt666h4oad/PcI1ZNV8RVF95jTqfbuC1LpJ0/oa20VoHB1vbUMsiYteI2AlYCZyZuzO96XeDRcT/SfPx1uVAYIODrVlz4WBrG+NJYJvU63xS0n3ANEltJP1O0vOSXpH0X5BNFSnpSklvSvoH2ZwOpH2PSxqe1g+XNEXSy5IelbQ1WVD/TupV7ydpM0l3p2s8L+nz6diekh6RNFXSdUCDc/pK+l9JL6Rjzqix77JU/qikzVLZYEkPp2OelLR9Y3wzrWXz47r2maQe7BHAw6loGLBTRMxMAWtRROwhqT3wtKRHgN3I3pM2FOgNTCN7jU/ueTcDrgX2T+fqERELJF0NfBoRv0/1/gpcFhFPSdoSmAjsAIwFnoqIiyWNAMbk8XFOT9foCDwv6e6ImA90AiZHxHck/Syd+1tkbzE+MyKmS9oLuIpsykuzOjnY2obqKOmltP4kcD3Zn/fPRcTMVH4osHN1PhboCgwhe/Hk7RFRBcyV9M9azr838ET1uSJiQR3t+BIwNOdlFF0kbZqu8ZV07IOSFubxmc6V9OW0PiC1dT6wGhifym8F/pausS8wIefannHNGuRgaxtqWUTsmluQgs6S3CLgnIiYWKPekY3YjjJg74hYXktb8ibpQLLAvU9ELJX0ONChjuqRrvtxze+BWUOcs7WmMBE4S1JbAEnbSuoEPAGckHK6fYAv1nLsM8D+kgamY3uk8sVA55x6jwDnVG9Iqg5+TwAnpbIjgO4NtLUrsDAF2u3JetbVyoDq3vlJZOmJT4CZko5L15CkXRq4hpmDrTWJ68jysVMkvQb8heyvqHuA6WnfzWRvnlhHRHwEnEH2J/vLrP0z/n7gy9U3yIBzgeHpBtw01o6K+DlZsJ5Klk5o6O3CDwPlkl4HLiUL9tWWAHumz3AQcHEqPxkYk9o3ley9b2b18qxfZmYF4J6tmVkBONiamRWAg62ZWQE42JqZFYCDrZlZATjYmpkVgIOtmVkB/H8f6SE123TwSgAAAABJRU5ErkJggg==\n",
            "text/plain": [
              "<Figure size 432x288 with 2 Axes>"
            ]
          },
          "metadata": {
            "tags": [],
            "needs_background": "light"
          }
        }
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "d6uGXlc8vpYw",
        "colab_type": "text"
      },
      "source": [
        "#### Weighted classes and output bias\n",
        "To demonstrate weighted classes, we'll use a different fraud detection dataset in BigQuery. This one has far fewer minority class examples than the one used in the example above."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "gpVf0hJITM0G",
        "colab_type": "code",
        "colab": {}
      },
      "source": [
        "# To access BigQuery, you'll need to authenticate to your Cloud account\n",
        "from google.colab import auth\n",
        "auth.authenticate_user()"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "GxPRFYscybVP",
        "colab_type": "text"
      },
      "source": [
        "We'll take all of the fraud examples from this dataset, and a subset of non-fraud. Then we'll shuffle and combine and look at the number of examples we have for each class."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "NZ_K-m0fLUez",
        "colab_type": "code",
        "colab": {}
      },
      "source": [
        "%%bigquery fraud_df --project sara-cloud-ml\n",
        "SELECT\n",
        "  *\n",
        "FROM\n",
        "  `bigquery-public-data.ml_datasets.ulb_fraud_detection`\n",
        "WHERE Class = 1"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "Xb4NF_2NwIld",
        "colab_type": "code",
        "colab": {}
      },
      "source": [
        "# This query will take a a minute to run\n",
        "%%bigquery nonfraud_df --project sara-cloud-ml\n",
        "SELECT\n",
        "  *\n",
        "FROM\n",
        "  `bigquery-public-data.ml_datasets.ulb_fraud_detection`\n",
        "WHERE Class = 0\n",
        "AND RAND() < 0.05"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "Knqk6iC7w5pb",
        "colab_type": "code",
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 439
        },
        "outputId": "4e56c061-50c3-4764-c91b-e673456e4e4b"
      },
      "source": [
        "bq_fraud_data = pd.concat([fraud_df, nonfraud_df])\n",
        "bq_fraud_data.sort_values(by=['Time'])\n",
        "# bq_fraud_data = shuffle(bq_fraud_data, random_state=22)"
      ],
      "execution_count": 19,
      "outputs": [
        {
          "output_type": "execute_result",
          "data": {
            "text/html": [
              "<div>\n",
              "<style scoped>\n",
              "    .dataframe tbody tr th:only-of-type {\n",
              "        vertical-align: middle;\n",
              "    }\n",
              "\n",
              "    .dataframe tbody tr th {\n",
              "        vertical-align: top;\n",
              "    }\n",
              "\n",
              "    .dataframe thead th {\n",
              "        text-align: right;\n",
              "    }\n",
              "</style>\n",
              "<table border=\"1\" class=\"dataframe\">\n",
              "  <thead>\n",
              "    <tr style=\"text-align: right;\">\n",
              "      <th></th>\n",
              "      <th>Time</th>\n",
              "      <th>V1</th>\n",
              "      <th>V2</th>\n",
              "      <th>V3</th>\n",
              "      <th>V4</th>\n",
              "      <th>V5</th>\n",
              "      <th>V6</th>\n",
              "      <th>V7</th>\n",
              "      <th>V8</th>\n",
              "      <th>V9</th>\n",
              "      <th>V10</th>\n",
              "      <th>V11</th>\n",
              "      <th>V12</th>\n",
              "      <th>V13</th>\n",
              "      <th>V14</th>\n",
              "      <th>V15</th>\n",
              "      <th>V16</th>\n",
              "      <th>V17</th>\n",
              "      <th>V18</th>\n",
              "      <th>V19</th>\n",
              "      <th>V20</th>\n",
              "      <th>V21</th>\n",
              "      <th>V22</th>\n",
              "      <th>V23</th>\n",
              "      <th>V24</th>\n",
              "      <th>V25</th>\n",
              "      <th>V26</th>\n",
              "      <th>V27</th>\n",
              "      <th>V28</th>\n",
              "      <th>Amount</th>\n",
              "      <th>Class</th>\n",
              "    </tr>\n",
              "  </thead>\n",
              "  <tbody>\n",
              "    <tr>\n",
              "      <th>13869</th>\n",
              "      <td>4.0</td>\n",
              "      <td>1.229658</td>\n",
              "      <td>0.141004</td>\n",
              "      <td>0.045371</td>\n",
              "      <td>1.202613</td>\n",
              "      <td>0.191881</td>\n",
              "      <td>0.272708</td>\n",
              "      <td>-0.005159</td>\n",
              "      <td>0.081213</td>\n",
              "      <td>0.464960</td>\n",
              "      <td>-0.099254</td>\n",
              "      <td>-1.416907</td>\n",
              "      <td>-0.153826</td>\n",
              "      <td>-0.751063</td>\n",
              "      <td>0.167372</td>\n",
              "      <td>0.050144</td>\n",
              "      <td>-0.443587</td>\n",
              "      <td>0.002821</td>\n",
              "      <td>-0.611987</td>\n",
              "      <td>-0.045575</td>\n",
              "      <td>-0.219633</td>\n",
              "      <td>-0.167716</td>\n",
              "      <td>-0.270710</td>\n",
              "      <td>-0.154104</td>\n",
              "      <td>-0.780055</td>\n",
              "      <td>0.750137</td>\n",
              "      <td>-0.257237</td>\n",
              "      <td>0.034507</td>\n",
              "      <td>0.005168</td>\n",
              "      <td>4.99</td>\n",
              "      <td>0</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>3098</th>\n",
              "      <td>11.0</td>\n",
              "      <td>1.069374</td>\n",
              "      <td>0.287722</td>\n",
              "      <td>0.828613</td>\n",
              "      <td>2.712520</td>\n",
              "      <td>-0.178398</td>\n",
              "      <td>0.337544</td>\n",
              "      <td>-0.096717</td>\n",
              "      <td>0.115982</td>\n",
              "      <td>-0.221083</td>\n",
              "      <td>0.460230</td>\n",
              "      <td>-0.773657</td>\n",
              "      <td>0.323387</td>\n",
              "      <td>-0.011076</td>\n",
              "      <td>-0.178485</td>\n",
              "      <td>-0.655564</td>\n",
              "      <td>-0.199925</td>\n",
              "      <td>0.124005</td>\n",
              "      <td>-0.980496</td>\n",
              "      <td>-0.982916</td>\n",
              "      <td>-0.153197</td>\n",
              "      <td>-0.036876</td>\n",
              "      <td>0.074412</td>\n",
              "      <td>-0.071407</td>\n",
              "      <td>0.104744</td>\n",
              "      <td>0.548265</td>\n",
              "      <td>0.104094</td>\n",
              "      <td>0.021491</td>\n",
              "      <td>0.021293</td>\n",
              "      <td>27.50</td>\n",
              "      <td>0</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>7017</th>\n",
              "      <td>41.0</td>\n",
              "      <td>1.138759</td>\n",
              "      <td>-1.192953</td>\n",
              "      <td>1.407131</td>\n",
              "      <td>-0.330070</td>\n",
              "      <td>-2.069503</td>\n",
              "      <td>-0.242175</td>\n",
              "      <td>-1.306635</td>\n",
              "      <td>0.104510</td>\n",
              "      <td>0.134628</td>\n",
              "      <td>0.493931</td>\n",
              "      <td>-0.895188</td>\n",
              "      <td>-0.182695</td>\n",
              "      <td>0.146081</td>\n",
              "      <td>-0.586611</td>\n",
              "      <td>0.797189</td>\n",
              "      <td>-0.891721</td>\n",
              "      <td>-0.079208</td>\n",
              "      <td>1.541588</td>\n",
              "      <td>-0.983586</td>\n",
              "      <td>-0.299307</td>\n",
              "      <td>-0.156198</td>\n",
              "      <td>-0.030569</td>\n",
              "      <td>-0.019723</td>\n",
              "      <td>0.433753</td>\n",
              "      <td>-0.029521</td>\n",
              "      <td>1.141241</td>\n",
              "      <td>-0.008612</td>\n",
              "      <td>0.041564</td>\n",
              "      <td>96.94</td>\n",
              "      <td>0</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>7868</th>\n",
              "      <td>41.0</td>\n",
              "      <td>1.145524</td>\n",
              "      <td>0.575068</td>\n",
              "      <td>0.194008</td>\n",
              "      <td>2.598192</td>\n",
              "      <td>-0.092210</td>\n",
              "      <td>-1.044430</td>\n",
              "      <td>0.531588</td>\n",
              "      <td>-0.241888</td>\n",
              "      <td>-0.896287</td>\n",
              "      <td>0.757952</td>\n",
              "      <td>-0.448937</td>\n",
              "      <td>-0.660863</td>\n",
              "      <td>-1.308522</td>\n",
              "      <td>0.788864</td>\n",
              "      <td>0.320294</td>\n",
              "      <td>0.295404</td>\n",
              "      <td>-0.287878</td>\n",
              "      <td>-0.451453</td>\n",
              "      <td>-1.011446</td>\n",
              "      <td>-0.191050</td>\n",
              "      <td>0.011106</td>\n",
              "      <td>-0.119703</td>\n",
              "      <td>-0.076510</td>\n",
              "      <td>0.691320</td>\n",
              "      <td>0.633984</td>\n",
              "      <td>0.048741</td>\n",
              "      <td>-0.053192</td>\n",
              "      <td>0.016251</td>\n",
              "      <td>34.13</td>\n",
              "      <td>0</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>4856</th>\n",
              "      <td>49.0</td>\n",
              "      <td>-0.549626</td>\n",
              "      <td>0.418949</td>\n",
              "      <td>1.729833</td>\n",
              "      <td>0.203065</td>\n",
              "      <td>-0.187012</td>\n",
              "      <td>0.253878</td>\n",
              "      <td>0.500894</td>\n",
              "      <td>0.251256</td>\n",
              "      <td>-0.227985</td>\n",
              "      <td>-0.576169</td>\n",
              "      <td>1.102032</td>\n",
              "      <td>0.823708</td>\n",
              "      <td>-0.569510</td>\n",
              "      <td>0.008710</td>\n",
              "      <td>-1.041414</td>\n",
              "      <td>-0.603403</td>\n",
              "      <td>0.225484</td>\n",
              "      <td>-0.352133</td>\n",
              "      <td>0.194946</td>\n",
              "      <td>0.016970</td>\n",
              "      <td>0.115062</td>\n",
              "      <td>0.418529</td>\n",
              "      <td>-0.065133</td>\n",
              "      <td>0.264981</td>\n",
              "      <td>0.003958</td>\n",
              "      <td>0.395969</td>\n",
              "      <td>0.027182</td>\n",
              "      <td>0.043506</td>\n",
              "      <td>59.99</td>\n",
              "      <td>0</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>...</th>\n",
              "      <td>...</td>\n",
              "      <td>...</td>\n",
              "      <td>...</td>\n",
              "      <td>...</td>\n",
              "      <td>...</td>\n",
              "      <td>...</td>\n",
              "      <td>...</td>\n",
              "      <td>...</td>\n",
              "      <td>...</td>\n",
              "      <td>...</td>\n",
              "      <td>...</td>\n",
              "      <td>...</td>\n",
              "      <td>...</td>\n",
              "      <td>...</td>\n",
              "      <td>...</td>\n",
              "      <td>...</td>\n",
              "      <td>...</td>\n",
              "      <td>...</td>\n",
              "      <td>...</td>\n",
              "      <td>...</td>\n",
              "      <td>...</td>\n",
              "      <td>...</td>\n",
              "      <td>...</td>\n",
              "      <td>...</td>\n",
              "      <td>...</td>\n",
              "      <td>...</td>\n",
              "      <td>...</td>\n",
              "      <td>...</td>\n",
              "      <td>...</td>\n",
              "      <td>...</td>\n",
              "      <td>...</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>12815</th>\n",
              "      <td>172670.0</td>\n",
              "      <td>1.939795</td>\n",
              "      <td>-0.969791</td>\n",
              "      <td>-0.587017</td>\n",
              "      <td>-1.309403</td>\n",
              "      <td>-1.126185</td>\n",
              "      <td>-0.986735</td>\n",
              "      <td>-0.619530</td>\n",
              "      <td>-0.143709</td>\n",
              "      <td>2.387882</td>\n",
              "      <td>-0.995528</td>\n",
              "      <td>-0.736618</td>\n",
              "      <td>0.788915</td>\n",
              "      <td>0.061652</td>\n",
              "      <td>-0.149521</td>\n",
              "      <td>1.089282</td>\n",
              "      <td>-0.542341</td>\n",
              "      <td>-0.197894</td>\n",
              "      <td>0.215365</td>\n",
              "      <td>0.480936</td>\n",
              "      <td>-0.094239</td>\n",
              "      <td>0.285527</td>\n",
              "      <td>1.056616</td>\n",
              "      <td>0.023114</td>\n",
              "      <td>0.058599</td>\n",
              "      <td>-0.057562</td>\n",
              "      <td>-0.066172</td>\n",
              "      <td>0.031900</td>\n",
              "      <td>-0.035789</td>\n",
              "      <td>59.85</td>\n",
              "      <td>0</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>7239</th>\n",
              "      <td>172694.0</td>\n",
              "      <td>-0.816768</td>\n",
              "      <td>1.295381</td>\n",
              "      <td>-1.336395</td>\n",
              "      <td>-0.290017</td>\n",
              "      <td>0.877028</td>\n",
              "      <td>-0.639750</td>\n",
              "      <td>1.637290</td>\n",
              "      <td>0.031620</td>\n",
              "      <td>-1.334510</td>\n",
              "      <td>-1.075689</td>\n",
              "      <td>1.065118</td>\n",
              "      <td>0.716362</td>\n",
              "      <td>0.380555</td>\n",
              "      <td>-0.420083</td>\n",
              "      <td>-1.088048</td>\n",
              "      <td>-0.121209</td>\n",
              "      <td>0.858278</td>\n",
              "      <td>0.500646</td>\n",
              "      <td>0.601820</td>\n",
              "      <td>-0.052537</td>\n",
              "      <td>0.240530</td>\n",
              "      <td>0.615241</td>\n",
              "      <td>-0.269590</td>\n",
              "      <td>0.767657</td>\n",
              "      <td>0.139596</td>\n",
              "      <td>0.669905</td>\n",
              "      <td>-0.139549</td>\n",
              "      <td>0.060390</td>\n",
              "      <td>127.69</td>\n",
              "      <td>0</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>2852</th>\n",
              "      <td>172721.0</td>\n",
              "      <td>-0.947373</td>\n",
              "      <td>-0.059861</td>\n",
              "      <td>1.537605</td>\n",
              "      <td>0.117118</td>\n",
              "      <td>-0.315440</td>\n",
              "      <td>0.505595</td>\n",
              "      <td>0.342234</td>\n",
              "      <td>0.244829</td>\n",
              "      <td>-1.557720</td>\n",
              "      <td>0.613320</td>\n",
              "      <td>0.031953</td>\n",
              "      <td>-0.341449</td>\n",
              "      <td>-0.500256</td>\n",
              "      <td>0.216078</td>\n",
              "      <td>0.003889</td>\n",
              "      <td>-1.537024</td>\n",
              "      <td>-0.208283</td>\n",
              "      <td>2.281660</td>\n",
              "      <td>0.194544</td>\n",
              "      <td>0.086385</td>\n",
              "      <td>-0.239290</td>\n",
              "      <td>-0.367110</td>\n",
              "      <td>-0.034898</td>\n",
              "      <td>-0.561550</td>\n",
              "      <td>0.591367</td>\n",
              "      <td>-0.340983</td>\n",
              "      <td>0.310727</td>\n",
              "      <td>0.140940</td>\n",
              "      <td>138.00</td>\n",
              "      <td>0</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>8938</th>\n",
              "      <td>172744.0</td>\n",
              "      <td>-0.954967</td>\n",
              "      <td>1.114359</td>\n",
              "      <td>1.780434</td>\n",
              "      <td>-0.361532</td>\n",
              "      <td>-0.407570</td>\n",
              "      <td>-0.406413</td>\n",
              "      <td>0.210387</td>\n",
              "      <td>0.429851</td>\n",
              "      <td>-0.173493</td>\n",
              "      <td>-1.269369</td>\n",
              "      <td>-0.960160</td>\n",
              "      <td>0.459340</td>\n",
              "      <td>0.586295</td>\n",
              "      <td>0.101810</td>\n",
              "      <td>0.578975</td>\n",
              "      <td>0.046778</td>\n",
              "      <td>-0.214283</td>\n",
              "      <td>-0.004449</td>\n",
              "      <td>0.429980</td>\n",
              "      <td>-0.086966</td>\n",
              "      <td>-0.127048</td>\n",
              "      <td>-0.424787</td>\n",
              "      <td>-0.204041</td>\n",
              "      <td>0.025486</td>\n",
              "      <td>0.345074</td>\n",
              "      <td>-0.418458</td>\n",
              "      <td>-0.046577</td>\n",
              "      <td>0.014827</td>\n",
              "      <td>9.99</td>\n",
              "      <td>0</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>1399</th>\n",
              "      <td>172768.0</td>\n",
              "      <td>-0.669662</td>\n",
              "      <td>0.923769</td>\n",
              "      <td>-1.543167</td>\n",
              "      <td>-1.560729</td>\n",
              "      <td>2.833960</td>\n",
              "      <td>3.240843</td>\n",
              "      <td>0.181576</td>\n",
              "      <td>1.282746</td>\n",
              "      <td>-0.893890</td>\n",
              "      <td>-1.453432</td>\n",
              "      <td>0.187488</td>\n",
              "      <td>-0.390794</td>\n",
              "      <td>-0.289171</td>\n",
              "      <td>-0.510320</td>\n",
              "      <td>0.955637</td>\n",
              "      <td>0.553781</td>\n",
              "      <td>0.567862</td>\n",
              "      <td>0.409517</td>\n",
              "      <td>-0.671301</td>\n",
              "      <td>0.000965</td>\n",
              "      <td>0.183856</td>\n",
              "      <td>0.202670</td>\n",
              "      <td>-0.373023</td>\n",
              "      <td>0.651122</td>\n",
              "      <td>1.073823</td>\n",
              "      <td>0.844590</td>\n",
              "      <td>-0.286676</td>\n",
              "      <td>-0.187719</td>\n",
              "      <td>40.00</td>\n",
              "      <td>0</td>\n",
              "    </tr>\n",
              "  </tbody>\n",
              "</table>\n",
              "<p>14680 rows × 31 columns</p>\n",
              "</div>"
            ],
            "text/plain": [
              "           Time        V1        V2  ...       V28  Amount  Class\n",
              "13869       4.0  1.229658  0.141004  ...  0.005168    4.99      0\n",
              "3098       11.0  1.069374  0.287722  ...  0.021293   27.50      0\n",
              "7017       41.0  1.138759 -1.192953  ...  0.041564   96.94      0\n",
              "7868       41.0  1.145524  0.575068  ...  0.016251   34.13      0\n",
              "4856       49.0 -0.549626  0.418949  ...  0.043506   59.99      0\n",
              "...         ...       ...       ...  ...       ...     ...    ...\n",
              "12815  172670.0  1.939795 -0.969791  ... -0.035789   59.85      0\n",
              "7239   172694.0 -0.816768  1.295381  ...  0.060390  127.69      0\n",
              "2852   172721.0 -0.947373 -0.059861  ...  0.140940  138.00      0\n",
              "8938   172744.0 -0.954967  1.114359  ...  0.014827    9.99      0\n",
              "1399   172768.0 -0.669662  0.923769  ... -0.187719   40.00      0\n",
              "\n",
              "[14680 rows x 31 columns]"
            ]
          },
          "metadata": {
            "tags": []
          },
          "execution_count": 19
        }
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "rqMt8d_h1OjX",
        "colab_type": "code",
        "colab": {}
      },
      "source": [
        "# Scale time and amount values\n",
        "time_scaler = MinMaxScaler()\n",
        "amt_scaler = MinMaxScaler()\n",
        "\n",
        "bq_fraud_data['Time'] = time_scaler.fit_transform(bq_fraud_data['Time'].values.reshape(-1,1))\n",
        "bq_fraud_data['Amount'] = amt_scaler.fit_transform(bq_fraud_data['Amount'].values.reshape(-1,1))"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "CfYUpOoRzEG-",
        "colab_type": "code",
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 68
        },
        "outputId": "22f5fbb0-d09e-4542-9280-3abc4734f94d"
      },
      "source": [
        "# See data balance\n",
        "bq_fraud_data['Class'].value_counts()"
      ],
      "execution_count": 21,
      "outputs": [
        {
          "output_type": "execute_result",
          "data": {
            "text/plain": [
              "0    14188\n",
              "1      492\n",
              "Name: Class, dtype: int64"
            ]
          },
          "metadata": {
            "tags": []
          },
          "execution_count": 21
        }
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "1PlJtk_wxHZZ",
        "colab_type": "code",
        "colab": {}
      },
      "source": [
        "train_test_split = int(len(bq_fraud_data) * .8)\n",
        "\n",
        "train_data = bq_fraud_data[:train_test_split]\n",
        "test_data = bq_fraud_data[train_test_split:]\n",
        "\n",
        "train_labels = train_data.pop('Class')\n",
        "test_labels = test_data.pop('Class')"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "ZjRK6XsiGmtX",
        "colab_type": "code",
        "colab": {}
      },
      "source": [
        "# Create a tf dataset\n",
        "train_dataset = tf.data.Dataset.from_tensor_slices((train_data.values, train_labels))\n",
        "train_dataset = train_dataset.shuffle(len(train_data)).batch(1024)\n",
        "\n",
        "test_dataset = tf.data.Dataset.from_tensor_slices((test_data.values, test_labels))\n",
        "test_dataset = test_dataset.shuffle(len(test_data)).batch(1)"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "O45AF1OY5zSH",
        "colab_type": "text"
      },
      "source": [
        "Now let's try with weighted classes and add a bias initializer to our output layer. First, calculate the class weights."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "zGwni9mvze_w",
        "colab_type": "code",
        "colab": {}
      },
      "source": [
        "# Get number of examples for each class from the training set\n",
        "num_minority = train_labels.value_counts()[1]\n",
        "num_majority = train_labels.value_counts()[0]"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "86nzMWSo52v8",
        "colab_type": "code",
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 51
        },
        "outputId": "0b0505ea-ac5b-47cc-d178-d24be3708ff4"
      },
      "source": [
        "minority_class_weight = 1 / (num_minority / len(train_data)) / 2\n",
        "majority_class_weight = 1 / (num_majority / len(train_data)) / 2\n",
        "\n",
        "# Pass the weights to Keras in a dict\n",
        "# The key is the index of each class\n",
        "keras_class_weights = {0: majority_class_weight, 1: minority_class_weight}\n",
        "print(keras_class_weights)\n",
        "\n",
        "# Calculate output bias\n",
        "output_bias = math.log(num_minority / num_majority)\n",
        "print(output_bias)"
      ],
      "execution_count": 25,
      "outputs": [
        {
          "output_type": "stream",
          "text": [
            "{0: 0.5218627799502311, 1: 11.934959349593496}\n",
            "-3.1298224531174395\n"
          ],
          "name": "stdout"
        }
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "4jx8DvQzCLVh",
        "colab_type": "code",
        "colab": {}
      },
      "source": [
        "fraud_model = keras.Sequential([\n",
        "    keras.layers.Dense(16, input_shape=(len(train_data.iloc[0]),), activation='relu'),\n",
        "    keras.layers.Dropout(0.25),\n",
        "    keras.layers.Dense(16, activation='relu'),\n",
        "    keras.layers.Dense(1, activation='sigmoid', bias_initializer=tf.keras.initializers.Constant(output_bias))\n",
        "])"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "Im0x0p8BkME9",
        "colab_type": "code",
        "colab": {}
      },
      "source": [
        "metrics = [\n",
        "      tf.keras.metrics.BinaryAccuracy(name='accuracy'),\n",
        "      tf.keras.metrics.Precision(name='precision'),\n",
        "      tf.keras.metrics.Recall(name='recall'),\n",
        "      tf.keras.metrics.AUC(name='roc_auc'),\n",
        "]"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "qO1kTv5gCoa8",
        "colab_type": "code",
        "colab": {}
      },
      "source": [
        "fraud_model.compile(loss='binary_crossentropy', optimizer='adam', metrics=metrics)"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "XV9Rtxka52zN",
        "colab_type": "code",
        "colab": {}
      },
      "source": [
        "fraud_model.fit(train_dataset, validation_data=test_dataset, epochs=10, class_weight=keras_class_weights)"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "QtJZSoE_pzzr",
        "colab_type": "text"
      },
      "source": [
        "#### Reframing: using cluster distance as a prediction signal\n",
        "\n",
        "In this approach, train a clustering model and use the distance of new examples from clusters to detect anomalies. We'll train a kmeans model on the natality dataset to demonstrate this."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "VjVdxkGRqFYX",
        "colab_type": "code",
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 32
        },
        "outputId": "fe208201-fd72-4a85-f816-2945f7b08e8b"
      },
      "source": [
        "# This will take about a minute to run \n",
        "%%bigquery --project sara-cloud-ml\n",
        "CREATE OR REPLACE MODEL\n",
        "  `sara-cloud-ml.natality.baby_weight_clusters` OPTIONS(model_type='kmeans',\n",
        "    num_clusters=4) AS\n",
        "SELECT\n",
        "  weight_pounds,\n",
        "  mother_age,\n",
        "  gestation_weeks\n",
        "FROM\n",
        "  `bigquery-public-data.samples.natality`\n",
        "LIMIT 10000"
      ],
      "execution_count": 124,
      "outputs": [
        {
          "output_type": "execute_result",
          "data": {
            "text/html": [
              "<div>\n",
              "<style scoped>\n",
              "    .dataframe tbody tr th:only-of-type {\n",
              "        vertical-align: middle;\n",
              "    }\n",
              "\n",
              "    .dataframe tbody tr th {\n",
              "        vertical-align: top;\n",
              "    }\n",
              "\n",
              "    .dataframe thead th {\n",
              "        text-align: right;\n",
              "    }\n",
              "</style>\n",
              "<table border=\"1\" class=\"dataframe\">\n",
              "  <thead>\n",
              "    <tr style=\"text-align: right;\">\n",
              "      <th></th>\n",
              "    </tr>\n",
              "  </thead>\n",
              "  <tbody>\n",
              "  </tbody>\n",
              "</table>\n",
              "</div>"
            ],
            "text/plain": [
              "Empty DataFrame\n",
              "Columns: []\n",
              "Index: []"
            ]
          },
          "metadata": {
            "tags": []
          },
          "execution_count": 124
        }
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "l0Rl8S5Hskh3",
        "colab_type": "text"
      },
      "source": [
        "First, let's look at the cluster prediction results for an \"average\" example from our dataset."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "kMKEiKwEscw4",
        "colab_type": "code",
        "colab": {}
      },
      "source": [
        "%%bigquery average_pred --project sara-cloud-ml\n",
        "SELECT\n",
        "  *\n",
        "FROM\n",
        "  ML.PREDICT (MODEL `sara-cloud-ml.natality.baby_weight_clusters`,\n",
        "    (\n",
        "    SELECT\n",
        "      7.0 as weight_pounds,\n",
        "      28 as mother_age,\n",
        "      40 as gestation_weeks \n",
        "     )\n",
        "  )"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "ysXdWdXbtLzg",
        "colab_type": "code",
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 80
        },
        "outputId": "7ba806c7-8914-4ad9-c34e-fe2a4b9fc0ef"
      },
      "source": [
        "average_pred"
      ],
      "execution_count": 136,
      "outputs": [
        {
          "output_type": "execute_result",
          "data": {
            "text/html": [
              "<div>\n",
              "<style scoped>\n",
              "    .dataframe tbody tr th:only-of-type {\n",
              "        vertical-align: middle;\n",
              "    }\n",
              "\n",
              "    .dataframe tbody tr th {\n",
              "        vertical-align: top;\n",
              "    }\n",
              "\n",
              "    .dataframe thead th {\n",
              "        text-align: right;\n",
              "    }\n",
              "</style>\n",
              "<table border=\"1\" class=\"dataframe\">\n",
              "  <thead>\n",
              "    <tr style=\"text-align: right;\">\n",
              "      <th></th>\n",
              "      <th>CENTROID_ID</th>\n",
              "      <th>NEAREST_CENTROIDS_DISTANCE</th>\n",
              "      <th>weight_pounds</th>\n",
              "      <th>mother_age</th>\n",
              "      <th>gestation_weeks</th>\n",
              "    </tr>\n",
              "  </thead>\n",
              "  <tbody>\n",
              "    <tr>\n",
              "      <th>0</th>\n",
              "      <td>1</td>\n",
              "      <td>[{'CENTROID_ID': 1, 'DISTANCE': 0.764157285801...</td>\n",
              "      <td>7.0</td>\n",
              "      <td>28</td>\n",
              "      <td>40</td>\n",
              "    </tr>\n",
              "  </tbody>\n",
              "</table>\n",
              "</div>"
            ],
            "text/plain": [
              "   CENTROID_ID  ... gestation_weeks\n",
              "0            1  ...              40\n",
              "\n",
              "[1 rows x 5 columns]"
            ]
          },
          "metadata": {
            "tags": []
          },
          "execution_count": 136
        }
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "qGAh1SJTtRzQ",
        "colab_type": "text"
      },
      "source": [
        "Here, it's fairly obvious that this datapoint should be put in cluster 1, given the short distance from that cluster."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "AM6AMX8Os211",
        "colab_type": "code",
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 85
        },
        "outputId": "424c0d8e-534c-48e7-ebee-83e29ddd0d01"
      },
      "source": [
        "# Print the resulting cluster distances\n",
        "df['NEAREST_CENTROIDS_DISTANCE'].iloc[0]"
      ],
      "execution_count": 133,
      "outputs": [
        {
          "output_type": "execute_result",
          "data": {
            "text/plain": [
              "[{'CENTROID_ID': 1, 'DISTANCE': 0.7641572858019843},\n",
              " {'CENTROID_ID': 3, 'DISTANCE': 1.8753318107958212},\n",
              " {'CENTROID_ID': 2, 'DISTANCE': 2.443585441159741},\n",
              " {'CENTROID_ID': 4, 'DISTANCE': 3.529034745170229}]"
            ]
          },
          "metadata": {
            "tags": []
          },
          "execution_count": 133
        }
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "e9H0ZdxTtYGs",
        "colab_type": "text"
      },
      "source": [
        "Let's compare this with a cluster prediction for an outlier baby weight."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "nI5Cw4KFs97A",
        "colab_type": "code",
        "colab": {}
      },
      "source": [
        "%%bigquery outlier_pred --project sara-cloud-ml\n",
        "SELECT\n",
        "  *\n",
        "FROM\n",
        "  ML.PREDICT (MODEL `sara-cloud-ml.natality.baby_weight_clusters`,\n",
        "    (\n",
        "    SELECT\n",
        "      3.0 as weight_pounds,\n",
        "      20 as mother_age,\n",
        "      27 as gestation_weeks \n",
        "     )\n",
        "  )"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "FsCO01VftDCJ",
        "colab_type": "code",
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 80
        },
        "outputId": "0d7f6054-b9cc-48d9-96cf-7f00a92428c4"
      },
      "source": [
        "outlier_pred"
      ],
      "execution_count": 138,
      "outputs": [
        {
          "output_type": "execute_result",
          "data": {
            "text/html": [
              "<div>\n",
              "<style scoped>\n",
              "    .dataframe tbody tr th:only-of-type {\n",
              "        vertical-align: middle;\n",
              "    }\n",
              "\n",
              "    .dataframe tbody tr th {\n",
              "        vertical-align: top;\n",
              "    }\n",
              "\n",
              "    .dataframe thead th {\n",
              "        text-align: right;\n",
              "    }\n",
              "</style>\n",
              "<table border=\"1\" class=\"dataframe\">\n",
              "  <thead>\n",
              "    <tr style=\"text-align: right;\">\n",
              "      <th></th>\n",
              "      <th>CENTROID_ID</th>\n",
              "      <th>NEAREST_CENTROIDS_DISTANCE</th>\n",
              "      <th>weight_pounds</th>\n",
              "      <th>mother_age</th>\n",
              "      <th>gestation_weeks</th>\n",
              "    </tr>\n",
              "  </thead>\n",
              "  <tbody>\n",
              "    <tr>\n",
              "      <th>0</th>\n",
              "      <td>3</td>\n",
              "      <td>[{'CENTROID_ID': 3, 'DISTANCE': 3.726026043962...</td>\n",
              "      <td>3.0</td>\n",
              "      <td>20</td>\n",
              "      <td>27</td>\n",
              "    </tr>\n",
              "  </tbody>\n",
              "</table>\n",
              "</div>"
            ],
            "text/plain": [
              "   CENTROID_ID  ... gestation_weeks\n",
              "0            3  ...              27\n",
              "\n",
              "[1 rows x 5 columns]"
            ]
          },
          "metadata": {
            "tags": []
          },
          "execution_count": 138
        }
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "xBrYP7F8toXo",
        "colab_type": "text"
      },
      "source": [
        "Here there's a high distance from each cluster, which we can use to conclude that this might be an anomaly."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "bjZkP5AytjeB",
        "colab_type": "code",
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 85
        },
        "outputId": "473fa579-16c3-4ca1-8886-781c4f44e17a"
      },
      "source": [
        "outlier_pred['NEAREST_CENTROIDS_DISTANCE'].iloc[0]"
      ],
      "execution_count": 139,
      "outputs": [
        {
          "output_type": "execute_result",
          "data": {
            "text/plain": [
              "[{'CENTROID_ID': 3, 'DISTANCE': 3.726026043962655},\n",
              " {'CENTROID_ID': 1, 'DISTANCE': 3.8571677726904228},\n",
              " {'CENTROID_ID': 2, 'DISTANCE': 5.277867231738564},\n",
              " {'CENTROID_ID': 4, 'DISTANCE': 5.719630555422304}]"
            ]
          },
          "metadata": {
            "tags": []
          },
          "execution_count": 139
        }
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "xgVC2XKCU6ys",
        "colab_type": "text"
      },
      "source": [
        "Copyright 2020 Google Inc. Licensed under the Apache License, Version 2.0 (the \"License\"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an \"AS IS\" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License"
      ]
    }
  ]
}